LINQ query language design

グラフ用のLINQクエリ言語の設計は、.NET Frameworkの既存のIQueryable<T>のサポートに基づいて行われました。ノードとエッジをモデル化するために、エッジからその上流および下流のノードへのナビゲーション、およびノードからその受信および送信エッジのコレクションへのナビゲーションをサポートするデータタイプが導入されました。これらをコレクションとしてモデル化することで、列挙型のクエリを容易に利用できるようになりました。ノードとエッジを横断するナビゲーションは、SelectMany 演算子の適用に集約され、これは Cross-Apply 演算に似ています。SelectMany 演算子は非常に基本的な演算子（モナディック関数型プログラミングの bind に相当）であり、実行計画や最適化を可能にするさまざまな優れた代数的特性を備えています。例えば、フィルタはこの演算子と相性が良く、その結果、述語のプッシュダウンが可能となり、上流のフィルタリングや、クエリプロセッサ間のデータ転送が少なくなります。例えば、以下のようになります。

The design of the LINQ query language for the graph was largely based on the existing IQueryable<T> support in the .NET Framework. In order to model nodes and edges, data types were introduced that support the navigation from an edge to its upstream and downstream node, and from a node to its collections of incoming and outgoing edges. By modeling these as collections, enumerable style querying became readily available. Navigation across nodes and edges then boiled down to a SelectMany operator application, which is akin to a cross-apply operation. Given that the SelectMany operator is a very fundamental operator (equivalent to bind in monadic functional programming), it has various nice algebraic properties that enable execution planning and optimization. For example, filters commute well with this operator, thus allowing for predicate pushdown which results in upstream filtering and less data transfer between query processors. For example:

code:C#

from actor in graph.Nodes<Actor>()

from actedIn in actor.Edges<ActedIn>()

let actor = actedIn.Actor

let movie = actedIn.Movie

where actor.Name == "Nicole Kidman" && movie.Year < 2010

select movie

上記は以下のように最適化できます。

can be optimized into:

code:C#

from actor in graph.Nodes<Actor>()

where actor.Name == "Nicole Kidman"

from actedIn in actor.Edges<ActedIn>()

let movie = actedIn.Movie

where movie.Year < 2010

select movie”

Neo4jのCypherやSPARQLのような代替グラフ問い合わせ言語は、LINQベースの問い合わせ演算子代数の上に重ねることができ、グラフをノードとエッジのコレクションとしてモデル化し、その間のトラバーサルパターンを認識することができます。このようなグラフ問い合わせ言語は、エッジを介したノード間のトラバーサルのための構文糖を導入することで、単に構文の簡潔性に焦点を当てています。

Alternative graph query languages such a Neo4j’s Cypher and SPARQL can be layered on top of the LINQ-based query operator algebra, recognizing a graph can be modeled as a collection of nodes and edges with traversal patterns in between. Such graph query languages merely focus on conciseness in syntax by introducing syntactic sugar for node-to-node traversal via edges:

code:Graph

MATCH (nicole:Actor {name: 'Nicole Kidman'})-:ACTED_IN->(movie:Movie)

WHERE movie.year < 2010

RETURN movie

グラフを問い合わせ可能なコレクションとしてモデル化することは、LINQ to XMLがXMLを要素、属性などのコレクションとしてモデル化し、Children、Descendantsなどの豊富なトラバーサル・パターンを提供するアプローチと非常によく似ています。ある意味、このアプローチは、テーブル指向のデータベースシステムでカーソルを使って移動するのと同じように、グラフデータベースのデータにアクセスするための非常にトラバーサル指向の方法を提供します。このような理想的な走査パターンを認識し、それを効率的な実行プランに変換することにより（後述）、LINQオペレータの非常に基本的な代数的特性の恩恵を受けつつ、さらに多くのドメイン固有のグラフ問い合わせ言語を表現することができる、適切な問い合わせオプティマイザを構築することができます。これは、多くのフロントエンド言語のサポートを共通の中間表現の上に重ね、その下に最適化された実行バックエンドを持つ、伝統的なコンパイラと実行計画のパイプラインに過ぎません。

Modeling a graph as queryable collections is very similar to LINQ to XML’s approach of modeling XML as collections of elements, attributes, etc. with rich traversal patterns such as Children, Descendants, etc. In a way, this approach provides a very traversal-oriented way to accessing data in a graph database, in some way similar to using cursors to navigate a table-oriented database system. By recognizing such ideomatic traversal patterns and translating them to efficient execution plans (see below), a decent query optimizer can be built that can also benefit from very fundamental algebraic properties of LINQ operators, while also enabling expressing many more domain-specific graph query languages on top of it. This is merely a traditional compiler and execution planning pipeline with support for many front-end languages layered on top of a common intermediate representation with an optimized execution back-end underneath.